Wonderful Wines of the World

Initial Setup

Define some constant variables

Define some functions

Load Dataset

Data Understanding

Check datatypes

All variables are numeric.

Check for duplicates

Identify features for segmentation

Distributions of values for variables

Explore the Recency variable

Because of it's unusual distribution

Consider the points at Recency > 100 as "Lost Customers" in its own cluster. Observing in particular the relationship between Recency and Frequency, it appears that customers are unlikely to repurchase if their last transaction has been more than 100 days past.

Visualize Component Planes

Data Preparation

Check for correlation

Check for outliers

Too many outliers to remove just based on IQR

Transform Variables

MinMax Scaler

We only scale the Value Segmentation Features because the Wine Segmentation Features are all in the same scale (percentage) already.

Transform wine features to decimal (from percentage)

Use DBSCAN to identify 'noise' rows

Summary of Variables

Clustering

RFM Analysis

Some helpful functions

Wine Segmentation

Compare cluster means of different K

Characterize final wine segmentation clusters

Visualize wine segmentation clusters

Value Segmentation

Characterize value segmentation clusters

Visualize Value Segmentation

Merging cluster solutions

Visualizing merged cluster solution

Merge Wine and RFM

Merging Value and RFM

Characterizing final clusters

Exploring the clusters

Classify and predict noise and outlier rows

Visualize Decision Tree

Supporting Visualizations

Ideal cluster sizes: silhouette plots

Wine Segmentation

Value Segmentation

Other things we tried

Self Organizing Maps

Value

Visualize

Wine

Visualize

Mean Shift Clustering

Value

Wine

DBSCAN

Value

Wine